CGI Programming FAQ

The Table of contents starts at 0 (preamble). Older (pre-HTML 3) or broken browsers may start it at 1: apologies for any confusion.

Preamble
Basic Questions
HTTP Headers and NPH Scripts
Techniques: "How do I..."
Applications: Is there an existing script to ...
Troubleshooting a CGI application
1. Are there some interactive debugging tools and services available?
2. I'm having trouble with my headers. What can I do?
Further Reading
1. Other FAQs/collections (including online book)
2. Reference Pages

INDEX

Section 0: Preamble

NOTE: the Reply-to address in this FAQ is an autoresponder.   If you
want to write to me, you'll have to set the "To:" line by hand:
mailto:nick.kew@pobox.com

NOTE: the numbering in this document is automatically generated by my
posting software, and will change between postings if new questions are
added (as _may_ happen when I see a FAQ I've previously overlooked :-)

0.1: Changes

Last Modified: March 6th 1996:

* Added WDG website to 'where to get this document'
* Updated "verifying email" entry yet again (why won't people take
  "NO" for an answer)?

[Table of Contents] [Index]

0.2: Notice and Disclaimer

Copyright 1996/7 Nick Kew.

You are free to copy or distribute this document in whole or in part
for any purpose and on any medium you choose, provided: 

      You DON'T do so for profit.
      You DO include this notice and disclaimer in full.

Disclaimer: This information is offered in good faith and in the hope
that it may be of use, but is not guaranteed to be correct, up to date
or suitable for any particular purpose.   The author accepts no liability
in respect of this information or its use.

[Table of Contents] [Index]

0.3: Where to get this document

The homes of this document on the Web are now
* the WebThing WebCentre, at http://pobox.com/%7Ewebthing/
	URL  http://pobox.com/%7Ewebthing/docs/cgi/faqs/cgifaq.shtml
* the WDG website, at http://htmlhelp.com/
	URL  http://htmlhelp.com/faq/cgifaq.html

The WebThing is an interactive site, using CGI software that permits
readers to comment on, and contribute to, the FAQ itself.
See next question.

NOTE - If you want to mirror the FAQ on your WWW site, the best document
to use is the HTML version from htmlhelp or from my autoresponder
(see below).   If you're putting it on a publicly-visible server, please
make sure you keep it up-to-date (if you let me know you have it,
I can automate the updates).

Other known sources are:

(1) USENET: posted to newsgroups				(TEXT)
	news:comp.infosystems.www.authoring.cgi
	news:comp.answers
	news:news.answers

(2) RTFM and mirror sites					(TEXT)
	ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq

(3) RTFM WWW mirror sites, including			(Partial HTML)
	Europe - http://www.cs.ruu.nl/cgi-bin/faqwais 
	America - http://www.cis.ohio-state.edu/hypertext/faq/usenet/

(4) By EMAIL from my autoresponder 			(HTML or TEXT)
	Send blank email for info: currently it will respond to
	subject lines:
		send cgifaq.txt
	or
		send cgifaq.html
	but these may have changed if you're reading a saved copy.
	mailto:satfaq@pobox.com

(5) By EMAIL from the FAQserver at RTFM 			(TEXT)
	Send email to mailto:mail-server@rtfm.mit.edu with
		send usenet/news.answers/www/cgi-faq
	in the body of your message

[Table of Contents] [Index]

0.4: How to contribute to this document?

The WebThing software permits collaborative authoring using your web
browser. When you are reading any entry in this InterFAQ, you can add a
new entry which will then appear as another "more on" subject.
http://pobox.com/%7Ewebthing/

In order to maintain the quality of the FAQ, and avoid inappropriate
'commercial' entries, write permission is limited using an Access Control
List. If you have a contribution to make, send me an email including your
WebThing userid (i.e. what you entered in the registration form) and I'll
add you to the list.

InterFAQ readers - If your browser isn't showing a "new entry" button, then
either you aren't logged in or you're not on the access control list.

Note that this InterFAQ is limited to questions-and-answers appropriate to
periodic Usenet posting. Other types of contribution can be added
elsewhere in the WebCentre. For example

    * If you have a relevant website and want to link to it, enter it the
      appropriate collection (e.g. "scripts" or "misc").    You can then
      also include a description of your site, and have it indexed.
    * If you want to post a question or comment on something in this
      document, you can post it as a followup to the "flat" version of the
      FAQ (library document in the "FAQS" collection). 

If you don't want to use the InterFAQ you can always mail me
( mailto:nick.kew@pobox.com )

[Table of Contents] [Index]

0.5: Can I email the author my questions?

I already get more email than I can possibly answer personally, so
in general the answer is no - I'm NOT a free advice centre.
The possible exception is when something already in the FAQ needs
clarifying: don't expect a personal reply, but I *might* add
something to the answer in question, so check the next posting (or three).

The newsgroup is the appropriate place for free advice.   But remember:
bad questions usually get bad answers, so think carefully before posting.

[Table of Contents] [Index]

0.6: What's up with posting to comp.infosystems.www.authoring.cgi?

This is now a moderated newsgroup.   The moderator is a bot run by
Thomas Boutell ( mailto:boutell@boutell.com ).   The charter for
moderation is as follows:

  This newsgroup is self-moderated.  Your first posting will not appear
  until you have read and responded to an automatic welcome mailing, at
  which point your posting will appear with no further delay.  Provision
  will also be made to automatically approve first postings that contain
  a header requesting this.  Subsequent postings are approved
  automatically.

If posting normally doesn't work - as could be the case if your
newsfeed has trouble with moderated groups - you can post articles
by emailing them to:
	mailto:authoring-cgi@boutell.com
Provided the return address in your mail is correct, you will then
receive precise instructions for having your post(s) automatically approved.

Alternative means of posting are detailed in the WWW FAQ, posted
regularly by Thomas Boutell.

[Table of Contents] [Index]

0.7: Credits

This FAQ was written by Nick Kew, and has been considerably improved
with the help of comments and criticisms, newsgroup posts and
miscellaneous suggestions from Nathan Neulinger, Maurice L. Marvin,
Matthew Healy and Alan J. Flavell.

[Table of Contents] [Index]

Section 1: Basic Questions

This section aims to deal with basic questions, addressing the role and
nature of CGI, and its place in Web programming. Questions/answers which
just don't appear to 'fit' under any other section may also be included
here.

1.1: What is CGI?

[ from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.html ]

The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.

[Table of Contents] [Index]

1.2: Is it a script or a program?

The distinction is semantic.   Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts.   In the context of CGI, the distinction has become
even more blurred than before.   The words are often used interchangably
(including in this document).   Current usage favours the word "scripts"
for CGI programs.

[Table of Contents] [Index]

1.3: When do I need to use CGI?

There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.

[Table of Contents] [Index]

1.4: Should I use CGI or JAVA?

[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].

CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.   Neither are the two isomorphic: you could
in principle write a CGI program in JAVA, although it is hard to
think of an instance where this would be the best choice.

CGI is a mechanism for running programs on a WWW server.
Typical applications include accessing a database, submitting
an order, or posting messages to a bulletin board.
JAVA enables programs to run on the Client machine, and is
suited to such tasks as detailed manipulation of an image.
Alternatives to JAVA may include the X windows client/server
protocol, use of browser plugins and helper applications, and
other clientside languages such as SafeTCL and perl/penguin.

In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.

[Table of Contents] [Index]

1.5: Should I use CGI or SSI?

CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference.   Here are a few
guidelines:
  1) CGI is a common standard agreed and supported by all major HTTPDs.
     SSI is NOT a common standard, but an innovation of NCSA's HTTPD
     which has been widely adopted in later servers.   CGI has the
     greatest portability, if this is an issue.
  2) If your requirement is sufficiently simple that it can be done
     by SSI without invoking an exec, then SSI will probably be
     more efficient.   A typical application would be to include
     sitewide 'house styles', such as toolbars, netscapeised <body>
     tags or embedded CSS stylesheets.
  3) For more complex applications - like processing a form -
     where you need to exec (run) a program in any case, CGI
     is usually the best choice.

[Table of Contents] [Index]

1.6: Should I use CGI or an API?

APIs are proprietary programming interfaces supported by particular
platforms.   By using an API, you lose all portability.   If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it.   Otherwise stick to CGI.

[Table of Contents] [Index]

1.7: What do I absolutely need to know?

If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
  1) Installation notes for your HTTPD.   Is it configured to run CGI
     scripts, and if so how does it identify that a URL should be executed?
     (Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
     find it ask your server administrator).
  2) The CGI specification at NCSA tells you all you need to know
     to get your programs running as CGI applications.
     http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
  3) WWW Security FAQ.   This is not required to 'get it working', but
     is essential reading if you want to KEEP it working!
     http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

If you're NOT already a programmer, you'll have to learn.   If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI.   Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.

[Table of Contents] [Index]

1.8: Does CGI create new security risks?

Yes.   Period.
There is a lot you can do to minimise these.   The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html .

[Table of Contents] [Index]

1.9: Do I need to be on Unix?

No, but it helps.   The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix.   At the time of writing, this is still the
most mature and best-supported platform for Web applications.

[Table of Contents] [Index]

1.10: Do I have to use Perl?

No - you can use any programming language you please.   Perl is simply
today's most popular choice for CGI applications.   Some other widely-
used languages are C, TCL, BASIC and - for simple tasks - even shell scripts.

Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.

[Table of Contents] [Index]

1.11: Do I have to put it in cgi-bin?

see next question

[Table of Contents] [Index]

1.12: Do I have to call it .cgi? .pl?

Maybe.   It depends on your server installation.

These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.

If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs.   As a last resort, ask the server administrator.

[Table of Contents] [Index]

1.13: What is CGIWrap, and how does it affect my program?

[ quoted from http://www.umr.edu/~cgiwrap/intro.html ]

> CGIWrap is a gateway program that allows general users to use CGI scripts
> and HTML forms without compromising the security of the http server.
> Scripts are run with the permissions of the user who owns the script. In
> addition, several security checks are performed on the script, which will not
> be executed if any checks fail. 
> 
> CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap
> is configured to run user scripts which are located in the
> ~/public_html/cgi-bin/ directory. 

See http://www.umr.edu/~cgiwrap/

[Table of Contents] [Index]

Section 2: HTTP Headers and NPH Scripts

This is a fairly technical section dealing with HTTP, the protocol of
the Web. It also includes NPH, the mechanism by which CGI programs can
return HTTP header information directly to the Client.

2.1: What is HTTP (HyperText Transfer Protocol)?

HTTP is the protocol of the Web, by which Servers and Clients (typically
browsers) communicate.  An HTTP transaction comprises a Request sent by
the Client to the Server, and a Response returned from the Server to
the Client.
Every HTTP request and response includes a message header, describing
the message.   These are processed by the HTTPD, and may often be
mostly ignored by CGI applications (but see below).
A message body may also be included:
  1) A HEAD or GET request sends only a header.   Any form data is encoded 
     in an HTTP_QUERY_STRING header field, which is available to the CGI
     program as an environment variable QUERY_STRING.
  2) A POST request sends both header and body.   The body typically
     comprises data entered by a user in a form.
  3) A HEAD request does not expect a body in the response.
  4) A GET or POST request will accept a response with or without a body,
     according to the header.   The body of a response is typically an
     HTML document.

[Table of Contents] [Index]

2.2: What HTTP request headers can I use?

Most HTTP request headers are passed to the CGI script as environment
variables.   Some are guaranteed by the CGI spec.   Others are server,
browser and/or application dependent.

To see what _your_ browser and server are telling each other, just use
a trivial little CGI script to print out the environment.   In Unix:
	#!/bin/sh
	echo "Content-type: text/plain"
	echo
	set

(Just call it "env.cgi" or something, and put it where your server
will execute it.   Then point your browser at
http://your.server/path/to/env.cgi ).

This enables you to see at-a-glance what useful server variables are set.
Note that dumping the environment like this within a more complex
script can be a useful debugging technique.

For details, see the CGI Environment Variables specification at
http://hoohoo.ncsa.uiuc.edu/cgi/env.html
(which also includes a version of the above script - somewhat more
nicely formatted - online).

[Table of Contents] [Index]

2.3: What Environment variables are available to my application?

See previous question.   Those you can rely on are documented in NCSA's
pages; those associated with your particular server and browser can
be determined using the above script.

[Table of Contents] [Index]

2.4: What HTTP response headers do I need to know about?

Unless you are using NPH, the HTTPD will insert necessary response
headers on your behalf, always provided it is configured to do so.

However, it is conventional for servers to insert the Content-type header
based on a page's filename, and for CGI scripts it will often be absent or
wrong.  Hence the usual advice is to print an explicit Content-type header.

Some other headers you may wish to use explicitly are:
Status		(to set HTTP return code explicitly.   Caveats:
		   (1) Behaviour is undefined if it conflicts with
		   another header. (2) This is NOT an HTTP header.)
Location	(to redirect the user to another URI, which may or may
		not be on your own server)
Set-cookie	(Netscape/Nonstandard) Set a cookie
Refresh		(Netscape/Nonstandard) Clientpull

You can also use general MIME headers: eg "Keywords" for the benefit of
indexers (although in this instance some major search robots have
regrettably introduced a new protocol to do the same thing).

The 'official' list of HTTP response headers is at
http://www.w3.org/pub/WWW/Protocols/HTTP/Object_Headers.html

[Table of Contents] [Index]

2.5: What is NPH?

NPH = No Parsed Headers.   The script undertakes to print the entire
HTTP response including all necessary header fields.   The HTTPD
is thereby instructed not to parse the headers (as it would normally do)
nor add any which are missing.

[Table of Contents] [Index]

2.6: Must/should/can I write nph scripts?

Generally, no.   It is usually better to save yourself hassle by letting
the HTTPD produce the headers for you.

If you are going to use NPH, be sure to read and understand the HTTP spec at
http://www.w3.org/pub/WWW/Protocols/

Your headers should be complete and accurate, because you're instructing
the HTTPD not to correct them or insert what's missing.

Possible circumstances where the use of NPH is appropriate are:
  * When your headers are sufficiently unusal that they might be
    differently parsed by different HTTPDs (eg combining "Location:"
    with a "Status:" other than 302).
  * When returning output over a period of time (eg displaying
    unbuffered results of a slow operation in 'real' time).
See http://www.w3.org/pub/WWW/Protocols/HTTP/HTRESP.html

[Table of Contents] [Index]

2.7: Do I have to call it nph-*

According to NCSA's reference pages, this is the standard for telling
the server that your script is NPH, so this should be a fully portable
convention.

[Table of Contents] [Index]

2.8: What is the difference between GET and POST?

Firstly, the the HTTP protocol specifies differing usages for the two
methods.   GET requests should always be idempotent on the server.
This means that whereas one GET request might (rarely) change some state
on the Server, two or more identical requests will have no further effect.

This is a theoretical point which is also good advice in practice.
If a user hits "reload" on his/her browser, an identical request will be
sent to the server, potentially resulting in two identical database or
guestbook entries, counter increments, etc.   Browsers may reload a
GET URL automatically, particularly if cacheing is disabled (as is usually
the case with CGI output), but will typically prompt the user before
re-submitting a POST request.   This means you're far less likely to get
inadvertently-repeated entries from POST.

GET is (in theory) the preferred method for idempotent operations, such
as querying a database, though it matters little if you're using a form.
There is a further practical constraint that many systems have builtin
limits to the length of a GET request they can handle: when the total size
of a request (URL+params) approaches or exceeds 1Kb, you are well-advised
to use POST in any case.

In terms of mechanics, they differ in how parameters are passed to the
CGI script.   In the case of a POST request, form data is passed on
STDIN, so the script should read from there (the number of bytes to be
read is given by the Content-length header).   In the case of GET, the
data is passed in the environment variable QUERY_STRING.   The content-type
(application/x-www-form-urlencoded) is identical for GET and POST requests.

[Table of Contents] [Index]

Section 3: Techniques: "How do I..."

This section comprises programming hints and tips for a number of popular
tasks. Also included are a number of common questions to which the answer
is "you can't", with the reasons why.

3.1: Can I get information about who is visiting?

*sigh*
Many people keep mailing me questions or suggested hacks to get
visitor information, particularly email addresses.   It seems they
won't take "NO" for an answer.

The bottom line is that whatever information is available to _you_
is _equally_ available to every spammer on the net.   Therefore when
a browser bug _does_ permit personal data to be collected, it gets
reported and fixed very quickly (one short-lived Netscape release
reportedly had such a bug).

You can get some limited information from the environment variables
passed to you by the browser.   Relatively few of these are guaranteed
to be available, and some may be misleading.   For particular types
of information, see below.   For full details, see NCSA's reference pages.

[Table of Contents] [Index]

3.2: Can I get the email of visitors?

Why do you want to do this?

The best information available is the REMOTE_ADDR and REMOTE_HOST,
which tell you nothing about the user.   Techniques such as "finger@"
are not reliable, are widely disliked, and generally serve only to
introduce long delays in your CGI.   Better - as well as more polite -
just to ask your users to fill in a form.

BTW: the "From:" header line (HTTP_FROM variable) is usually only set
by robots, since human visitors to your webpage will not normally want
their addresses collected without permission and browsers respect this.

[Table of Contents] [Index]

3.3: "But I saw some.kool.site display my email address..."

Some sites will play party tricks, which can get *some users* email
addresses.   Possible tell-tale signs of this are inordinate delays
loading a page (fingering @REMOTE_HOST - doesn't often work but
probably can't be detected from the webpage), or a submit button that
appears to do nothing at all (a mailto: link - works quite well but
trivially detectable).   As a "snoop" party trick that's fine, but
if you find someone abusing these facilities (eg they send you
junkmail), alert their service provider!

[Table of Contents] [Index]

3.4: Can I verify the email addresses people enter in my Form?

Unfortunately people will sometimes enter an incorrect or invalid
email address in your Form.   Worse, they may enter a valid but
incorrect email address that will deliver to someone who doesn't
want your mail.

Proposed regexps to match email addresses are sometimes posted.
Unfortunately these are almost always incorrect, and well-meaning
use of them can make life difficult for legitimate users with
unusual addresses.   For example, a (valid) email address of someone
whose Internet connection is via X400 will look something like
S=N.OTHER/OU1=X12345A/RECIPNUM=1/MTA-BASIC@attmail.com.

A frequently-suggested hack that doesn't work is to use
SMTP EXPN or VRFY commands.   Modern versions of sendmail permit
administrators to disable these commands, and many sites take
advantage of this facility to protect their users' privacy.

Probably the best way to verify an email address is to send mail to
it, asking the user to respond.   Include a clause like "if you have
received this mail in error, please accept our apologies..."
You can then set up Procmail to handle the responses (but note that
separate reply and bounce addresses may lead to problems with broken
mailers replying to the wrong one - stick to one address if possible).

[Table of Contents] [Index]

3.5: Can I get browser details and return different pages?

Why do you want to do this?

Well-written HTML will display correctly in any browser, so the correct
answer to this question is to design a template for your output in good
HTML, and make sure your output is correct.

If you insist on a different answer, you can use the HTTP_USER_AGENT
environment variable.  This requires care, and can lead to unexpected
results.   For example, checking for "Mozilla" and serving a frameset
to it ensures that you *also* serve the frameset to early (Non-Frame)
Netscapes, me-too browsers (notably MicroSoft) and others who have
chosen to lie to you about their browser.

Note also that not every User Agent is a browser.   Your page may be
read by a user agent you've never heard of, and then displayed by
100 different browsers.   Or retrieved by different browsers from
a cache.   Another reason to write good HTML, and not try to
devise a clever or koool substitute.

[Table of Contents] [Index]

3.6: Can I trace where a user has come from/is going to?

HTTP_REFERER might or might not tell you anything.   By all means
use it to collect partial statistics if you participate in (say)
an advertising banner scheme.   But it is not always set, and may
be meaningless (eg if a user has accessed your page from a bookmark,
and the browser is too dumb to cope with this).

You cannot trace outgoing links at all.   If you really must try,
point all the external links to your HTTPD and use its redirection
facility (which gives you generally-reliable logs).   This is much
less inefficient than using a CGI script.

BTW: don't even think about asking Javascript to send you information
on some event: it's a violation of privacy which Netscape fixed as
soon as complaints about its abuse started coming in.   If it works
with *your* browser, you should upgrade!

[Table of Contents] [Index]

3.7: Can I launch a long process and return a page before it's finished?

[UNIX]
You have to fork/spawn the long-running process.
The important thing to remember is to close all its file descriptors;
otherwise nothing will be returned to the browser until it's finished.
The standard trick to accomplish this is redirection to/from /dev/null:

        exec ("long_process < /dev/null > /dev/null 2>&1 &")
        print HTML page as usual

[Table of Contents] [Index]

3.8: Can I launch a long process which the user interacts with?

This does not fit well with the basic mechanics of the Web, in which
each transaction comprises a single request and response.
If your processing can be done on the Client machine, you can use
a clientside application; for example a Java applet.

For processing on the server, one trick that works well for Clients
running an X server (and far, far more efficient than a JAVA solution) is:
  if ( fork() ) {
    print HTML page explaining what's going on and advising about xhost
  } else {
    exec ("xterm -display THEIR_DISPLAY -title MY_APP -e MY_PROG ARGS
        < /dev/null > /dev/null 2>&1 &") ;
  }
NOTE: THEIR_DISPLAY is not necessarily the same as REMOTE_HOST or REMOTE_ADDR.
You have to ask users to supply their display (set REMOTE_HOST as default).

[Table of Contents] [Index]

3.9: Can I password-protect my pages?

Yes.   Use your HTTPD's authentication, just as you would a basic HTML page.
Now you'll have the identity of every visitor in REMOTE_USER.

[Table of Contents] [Index]

3.10: Can I do HTTP authentication using CGI?

It depends on which version of the question you asked.

Yes, you can use CGI to trigger the browser's standard Username/Password
dialogue.   Send a response code 401, together with a "WWW-authenticate"
header including details of the the authentication scheme and realm:
e.g. (in a non-NPH script)

	Status: 401 Unauthorized to access the document
	WWW-authenticate: Basic realm="foobar"
	Content-type: text/plain

	Unauthorised to access this document

The use you can make of this is server-dependent, and harder,
since most servers expect to deal with authentication before ever
reaching the CGI (eg through .www_acl or .htaccess).
Thus it cannot usefully replace the standard login sequence, although
it can be applied to other situations, such as re-validating a user -
e.g after a certain timeout period or if the same person may need to
login under more than one userid.

What you can never get in CGI is the credentials returned by the user.
The HTTPD takes care of this, and simply sets REMOTE_USER to the
username if the correct password was entered.

[Table of Contents] [Index]

3.11: Can I identify users/sessions without password protection?

The most usual (but browser-dependent) way to do this is to set a cookie.
If you do this, you are accepting that not all users will have a 'session'.

An alternative is to pass a session ID in every GET URL, and in hidden
fields of POST requests.   This can be a big overhead unless _every_ page
requires CGI in any case.

Another alternative is the Hyper-G solution of encoding a session-id in
the URLs of pages returned:
	http://hyper-g.server/session_id/real/path/to/page
This has the drawback of making the URLs very confusing, and causes any
bookmarked pages to generate old session_ids.

Note that a session ID based solely on REMOTE_HOST (or REMOTE_ADDR)
will NOT work, as multiple users may access your pages concurrently
from the same machine.

[Table of Contents] [Index]

3.12: Can I redirect users to another page?

For permanent and simple redirection, use the HTTPD configuration file:
it's much more efficient than doing it yourself.   Some servers enable
you to do this using a file in your own directory (eg Apache) whereas
others use a single configuration file (eg CERN).

For more complicated cases (eg process form inputs and conditionally
redirect the user), use the "Location:" response header.
If the redirection is itself a CGI script,  it is easy to URLencode
parameters to it in a GET request, but don't forget to escape the URL!

[Table of Contents] [Index]

3.13: Can I run a CGI script without returning a new page to the browser?

Yes, but think carefully first:  How are your readers going to know
that their "submit" has succeeded?   They may hit 'submit' many times!

The correct solution according to the HTTP specification is to
return HTTP status code 204.   As an NPH script, this would be:

	#!/bin/sh
	# do processing (or launch it as background job)
	echo "HTTP/1.0 204 No Change"
	echo

Alan J Flavell has pointed out that this will fail with certain
popular browsers, and suggests a workaround to accommodate them:

> 1. Send status 204, Content-type of text/html, and a short body content
> that (for those few browsers that display it) will tell the reader that
> their browser does not handle this reponse correctly, and invites them
> to use their browser's Back function (hey, if someone tells me to put
> a back button on the HTML page itself, I think I shall scream...).

His survey is at
http://ppewww.ph.gla.ac.uk/%7Eflavell/status204/results.html

[Table of Contents] [Index]

3.14: Can I write output to a different Netscape frame?

Yep.   The fact you're using CGI makes no difference: use
"target=" in your links as usual.   Alternatively, the script
can print a "Window-target:" header.   Read Netscape's pages
for detail: these answer all the questions about things like
"getting rid of" or "breaking out of" frames, too.

[Table of Contents] [Index]

3.15: Can I write output to several frames at once?

A single CGI script can only ever print to one frame.

However, this limitation may be overcome by using more than one script.
The first script (the URL of the "submit" button) prints a frameset,
typically to a "_parent" or "_top" target.   The sources for one or
more of the frames thus generated may also be CGI scripts, to which
you can easily pass parameters (eg encoded in URLs with method GET).
This hack is definitely not recommended.   If you find yourself wanting
to update several frames from a single user event, it probably means
you should review the design of your application at a higher level.

Warnings:
 1. Don't forget to escape your URLs.
 2. This technique results in your server being hit by multiple 
    concurrent CGI requests.   You'll need LOTS of memory, especially
    if you use a memory-hog like Perl.   It can be a good recipe
    for bringing a server to its knees.

Javascript is often a valid alternative here, but note just how silly
it can (and often does) look in a different browser.

[Table of Contents] [Index]

3.16: Can I use a CGI script to generate both text and inline images?

Not directly.   One script generates one response to one request.

If you want to generate a dynamic page including dynamic images
(say, a report including graphs, all of which depend on user input)
then your primary script will print the usual
   <img src="[script-to-generate-image]" alt="[what you asked for]">
and, just as in the multiple frames case, you can pass data to the
image-generating program encoded in a GET URL.   Of course, the same
caveats apply: see above.

[Table of Contents] [Index]

3.17: How can I use Caches to make CGI scripts faster and more Net-friendly?

This is currently beyond the scope of this FAQ (whose author urgently
needs to improve his own applications in this regard).   However,
there is an excellent introduction to net-friendly webpages, including
CGI pages, at http://vancouver-webpages.com/CacheNow/

A sample cacheing perl/cgi script by Andrew Daviel is available at
http://vancouver-webpages.com/proxy/log-tail.pl

[Table of Contents] [Index]

Section 4: Applications: Is there an existing script to ...

There are a lot of applications available.   For all the tasks
listed here, there are free systems you can download and install
yourself (at least if you're on UNIX).   Many are excellent.

Before ever *buying* software, do a Net search on what you want and
check what freeware is available.   Does the commercial system you
had in mind *really* have any advantages?   If you can't follow
the jargon they use to explain the merits of their system, insist
on some clarification (hey, that's not just for Web software :-)

Most questions under this heading are probably best answered by
reference to appropriate review sites on the Web (in many cases,
Thomas Boutell's WWW FAQ).   In cases where I know of one or more
good sites, I've referenced them.

4.1: Where to look for free scripts for my application?

Some popular places to look for a wide range of free CGI applications are:

Selena Sol's Public Domain CGI Scripts
http://www2.eff.org/~erict/Scripts/scripts.html

Matt Wright's Script Archive
http://www.worldwidemart.com/scripts/

Dale Bewley has a much longer list of script archives
(along with his own scripts) at
http://www.engr.iupui.edu/~dbewley/perl/

[Table of Contents] [Index]

4.2: Discussion group/bulletin board

David R Woolley maintains a list of currently around 100 systems at
http://freenet.msp.mn.us/~drwool/webconf.html
("Conferencing on the Web").

[Table of Contents] [Index]

4.3: CSCW/Groupware

There are several overview sites for this.   A few are:

The CSCW Yellow Pages, at
http://www11.informatik.tu-muenchen.de/cscw/yp/YP-index-type.html

NCSA Web Collaboration pages, at
http://union.ncsa.uiuc.edu/HyperNews/get/www/collaboration.html

[Table of Contents] [Index]

4.4: Database

This subject deserves its own FAQ.   When someone recently asked about one,
Matthew.Healy@yale.edu (Matthew D. Healy) posted this answer (slightly chopped)

> : Is there a CGI and Database FAQ available?
> : If so, could someone tell me where can I get it?
> 
> Dunno about a FAQ on that.  I can recommend a couple of published
> works, however:
> 
> 1. I wrote a chapter about CGI/Database work for the book
> {Special Edition Using CGI}.  Fulltext is online at the
> publisher's WWW site:
> 
> http://www.mcp.com/que/et/se_cgi/  The book
> http://www.mcp.com/que/et/se_cgi/Cgi13fi.htm  My chapter on WWW/DBMS
> 
> 2. Jeff Rowe wrote an excellent book, {Building Internet Database
> Servers With CGI}.  URL for more info:
> 
> http://cscsun1.larc.nasa.gov/~beowulf/db/existing_products.html
> 
> Jeff's WWW site has scads of useful information on WWW/DBMS programming,
> and pointers to lots more sites.

Matthew's CGI links page at http://ycmi.med.yale.edu/~healy/cgilinks.html
expands the list, and includes links to popular packages including
Bo Frese Rasmussen's WDB at http://venus.dtv.dk/~bfr/wdb/

[Table of Contents] [Index]

Section 5: Troubleshooting a CGI application

Since this subject is quite well conered by other documents, this FAQ has
relatively little to say. 

Tom Christiansen's "Idiot's guide to solving Perl/CGI problems" is a
slightly tongue-in-cheek list of common problems, and how to track
them down.  Much of what Tom covers is not specifically Perl, but
applies equally to CGI programming in other languages. 

Marc Hedlund's CGI FAQ and Thomas Boutell's WWW FAQ also
deal with this subject. 

See "Further Reading" below (if you don't already know where to find these
documents).

5.1: Are there some interactive debugging tools and services available?

If you're using Perl, get Lincoln Stein's CGI.pm module.   I cannot
recommend this more highly: in addition to making some quite advanced
perl/CGI programming as easy as HelloWorld, it offers an interactive
debugging mode.
http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

Nathan Neulinger's cgiwrap is another package with debugging aids.
http://www.umr.edu/~cgiwrap/

See also the next question.

[Table of Contents] [Index]

5.2: I'm having trouble with my headers. What can I do?

For simple cases, examining your response headers "by hand" may suffice:
(1) telnet to the host and port where the server is running - e.g.
        telnet www.myhost.com 80
(2) Enter HTTP request.   The most useful for this purpose is usually HEAD; eg
        HEAD /index.html HTTP/1.0
        (optionally other headers)
        (followed by a blank line)
Now you'll get a full HTTP response header back.

For complex cases, such as sending a request with several headers
(as a browser does) or POSTing a form, there is a free diagnosis
service at the WebThing WebCentre.   This will take a request from your
browser (eg form inputs) and forward the identical request to your
server, printing a full report of your request (request headers and
form data) and the response from your server (response headers and data).
http://pobox.com/~webthing/

[Table of Contents] [Index]

Section 6: Further Reading

6.1: Other FAQs/collections (including online book)

****	Lincoln Stein's FAQ is probably the most	****
****	important WWW document you will ever read.	****

Special Edition Using CGI (full book text available online)
http://www.mcp.com/que/et/se_cgi/

The Web Authoring FAQ by 'Galactus' Engelfriet and John Pozadzides
http://htmlhelp.com/links/wdgfaq.htm
(although at the time of writing the online version appears to be a little
behind the updated drafts posted).

For general WWW issues, the World Wide Web FAQ by Thomas Boutell
http://www.boutell.com/faq/

Another CGI FAQ, by Marc Hedlund
http://www.best.com/~hedlund/cgi-faq/

Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen
http://www.perl.com/perl/faq/perl-cgi-faq.html

The Idiot's Guide to solving Perl/CGI problems by Tom Christiansen
http://www.perl.com/perl/faq/idiots-guide.html

The WWW Security FAQ by Lincoln Stein
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

The WWW Virtual Library
http://WWW.Stars.com/Vlib/

[Table of Contents] [Index]

6.2: Reference Pages

The Common Gateway Interface (CGI)
http://www.ast.cam.ac.uk/%7Edrtr/cgi-spec.html
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

HyperText Transfer Protocol (HTTP)
http://www.w3.org/pub/WWW/Protocols/HTTP/

HyperText Markup Language (HTML)
http://www.w3.org/pub/WWW/MarkUp/

Up to Table of Contents

INDEX

The index is generated from an arbitrary list of keywords. If I've missed anything obvious that should be here, please let me know.